A robotic system for picking and placing objects from and into a constrained space

ABSTRACT

A system comprising: a database configured to store a multi-body model of a robot, the robot comprising a plurality of manipulators, and a plurality of joints and plurality of actuators and actuator motors configured to move the joints, and wherein the multi-body model includes a kinematic and geometric model of each manipulator, a catalog of models for objects to be manipulated, the models comprising a current configuration and a target configuration, and a functional mapping of sensory data to configurations of the robot and the manipulators needed to manipulate the objects; at least one hardware processor coupled with the database; and one or more software modules that, when executed by the at least one hardware processor, receive sensory data from within a constrained space, identify objects in the constrained space based on the received sensory data and the catalog of models, determine a target pose for the joints and the manipulators based on the sensory data and the current and target configurations associated with the identified object, and compute joint space positions to necessary to realize the target pose.

CROSS-REFERENCE TO RELATED APPLICATIONS

This application claims priority under 35 U.S.C. 119(e) to U.S. Provisional Patent Application No. 62/882,395, filed Aug. 2, 2019, U.S. Provisional Patent Application No. 62/882,396, filed Aug. 2, 2019 and U.S. Provisional Patent Application No. 62/882,397, filed Aug. 2, 2019 the disclosures of which are hereby incorporated herein by reference in their entirety as if set forth in full.

This application is also related to U.S. patent application Ser. No. unassigned, filed on Jul. 30, 2020, entitled SYSTEMS AND METHODS FOR ROBOTIC CONTROL UNDER CONTACT, and is also related to U.S. patent application Ser. No. unassigned, filed on Jul. 30, 2020, entitled ROBOTIC MANIPULATORS, all of which are incorporated herein by reference in their entirety as if set forth in full.

BACKGROUND 1. Technical Field

The embodiments described herein are related to robotic control systems, and more specifically to a robotic software system for accurate control of robots that physically interact with various objects in their environments, while simultaneously incorporating the force feedback from these physical interactions into a “control policy”.

2. Related Art

It is currently very hard to build automated machines for manipulating objects of various shapes, sizes, inertias, and materials. Within factories robots perform many kinds of manipulation on a daily basis. They lift massive objects, move with blurring speed, and repeat complex performances with unerring precision. Yet outside of these carefully controlled robot realms, even the most sophisticated robot would be unable to perform many tasks that involve contact with other objects. Everyday manipulation tasks would stump conventionally controlled robots. As such, outside of controlled environments, robots have only performed sophisticated manipulation tasks when operated by a human.

Within simulation, robots have performed sophisticated manipulation tasks such as grasping multifaceted objects, tying knots, carrying objects around complex obstacles, and extracting objects from piles of entangled objects. The control algorithms for these demonstrations often employ search algorithms to find satisfactory solutions, such as a path to a goal state, or a configuration of a gripper that maximizes a measure of grasp quality against an object.

For example, many virtual robots use algorithms for motion planning that rapidly search for paths through a state space that describes the kinematics and dynamics of the world. Almost all of these simulations ignore the robot's sensory systems and assume that the state of the world is known with certainty. As examples, they might be provided with greater accuracy of the objects' states, e.g., positions and velocities, than is obtainable using state-of-the-art sensors, they might be provided with states for objects that, due to occlusions, are not visible to sensors, or both.

In a carefully controlled environment, these assumptions can be met. For example, within a traditional factory setting, engineers can ensure that a robot knows the state of relevant objects in the world to accuracy sufficient to perform necessary tasks. The robot typically needs to perform a few tasks using a few known objects, and people are usually banned from the area while the robot is working. Mechanical feeders can enforce constraints on the pose of the objects to be manipulated. And in the event that a robot needs to sense the world, engineers can make the environment favorable to sensing by controlling factors such as the lighting and the placement of objects relative to the sensor. Moreover, since the objects and tasks are known in advance, perception can be specialized to the environment and task. Whether by automated planning or direct programming, robots perform exceptionally well in factories or other controlled environments. Within research labs, successful demonstrations of robots autonomously performing complicated manipulation tasks have relied on some combination of known objects, easily identified and tracked objects (e.g., a bright red ball), uncluttered environments, fiducial markers, or narrowly defined, task specific controllers.

Outside of controlled settings, however, robots have only performed sophisticated manipulation tasks when operated by a human. Through teleoperation, even highly complex humanoid robots have performed a variety of challenging everyday manipulation tasks, such as grasping everyday objects, using a power drill, throwing away trash, and retrieving a drink from a refrigerator.

But accurate control of robots that autonomously, physically interact with various objects in their environments has proved elusive.

SUMMARY

Embodiments for a simple robotic arm design with a simple geometry and that employs simple kinematics of movement are described herein.

According to one aspect, a system comprising: a database configured to store a multi-body (typically multi-rigid body) model of a robot, the robot comprising a plurality of manipulators (which need not be physically connected to one another), and a plurality of joints and plurality of actuators and actuator motors configured to move the joints, and wherein the multi-body model includes a kinematic and geometric model of each manipulator, a catalog of models for objects to be manipulated, the models comprising a current configuration and a target configuration, and a functional mapping of sensory data to configurations of the robot needed to manipulate the objects; at least one hardware processor coupled with the database; and one or more software modules that, when executed by the at least one hardware processor, receive sensory data from within a constrained space, identify objects in the constrained space based on the received sensory data and the catalog of models, determine a target pose for the joints and the manipulators based on the sensory data and the current and target configurations associated with the identified object, and compute joint space positions to necessary to realize the target pose.

These and other features, aspects, and embodiments are described below in the section entitled “Detailed Description.”

BRIEF DESCRIPTION OF THE DRAWINGS

Features, aspects, and embodiments are described in conjunction with the attached drawings, in which:

FIG. 1 is a diagram illustrating an example an environment in which a robot can be controlled in accordance with one embodiment;

FIG. 2 is a block diagram illustrating an example wired or wireless system that can be used in connection with various embodiments described herein; and

FIG. 3 is a diagram illustrating an example robot that can be used in the environment of FIG. 1 and controlled in accordance with one embodiment.

DETAILED DESCRIPTION

Systems, methods, and non-transitory computer-readable media are disclosed for robot control. The disclosure and the various features and advantageous details thereof are explained more fully with reference to the non-limiting embodiments and examples that are described and/or illustrated in the accompanying drawings and detailed in the following description. It should be noted that the features illustrated in the drawings are not necessarily drawn to scale, and features of one embodiment can be employed with other embodiments, even if not explicitly stated herein. Descriptions of well-known components and processing techniques may be omitted so as to not unnecessarily obscure the embodiments of the disclosure. The examples used herein are intended merely to facilitate an understanding of ways in which the disclosure can be practiced and to further enable those of skill in the art to practice the embodiments of the disclosure. Accordingly, the examples and embodiments herein should not be construed as limiting the scope of the disclosure. Moreover, it is noted that like reference numerals represent similar parts throughout the several views of the drawings.

This innovation concerns software for controlling machines to accurately manipulate objects that can be modeled by multi-body kinematics and dynamics (e.g., bottles, silverware, chairs). With such software, robots can be made to move furniture, pick up a pen, or use a wrench to tighten a bolt.

The system can be applied to any robot for which a multi-body dynamics model of the robot is available. The model can be built using a combination of CAD/CAE and system identification to determine best-fit parameters through physical experimentation. This includes legged robots, humanoid robots, and manipulator robots, among others. Multi-body dynamics studies the movement of systems of interconnected bodies under the action of external forces. The dynamics of a multi-body system are described by the laws of kinematics and by the application of Newton's second law (kinetics) or their derivative form Lagrangian mechanics. The solution of these equations of motion provides a description of the position, the motion and the acceleration of the individual components of the system and overall the system itself, as a function of time.

FIG. 1 is a diagram illustrating an example environment 100 in which the systems and methods described herein can be implemented. As can be seen, the system can use at least one RGB-D (color+depth) camera 105 to be aimed into the workspace 102 that the robot 104 will be operating in. The camera can be used for example to determine pose for various objects 108. Alternatively, poses can be determined or estimated using radio/electromagnetic wave triangulation, motion capture, etc.

In certain embodiments, the camera 105 may instead be a laser imaging device such as a LiDAR (a device for measuring distances by illuminating the target with laser light and measuring the reflection with a sensor. Differences in laser return times and wavelengths for many successive or simultaneous illuminations can then be used to make digital 3-D representations of a target.) The digital 3-D representation provided by a LiDAR (a “point cloud” or “depth image”) may be provided as input to computer imaging software to determine the position of robot's links 104 or identify the geometry of and locate objects 108 or other features in the environment 102.

In certain embodiments, every joint 106 of the robot 104 is instrumented with a position sensor (not shown), such as an optical encoder. Further, a force/torque sensor (not shown), such as a 6-axis force/torque sensor, can be used to sense forces acting on the robot's links 106. The sensors can be placed inline between two links affixed together. Alternatively, tactile skins over the surface of the robot's rigid links can be used to precisely localize pressures from contact arising between the robot and objects (or people) in the environment.

In certain embodiments, every joint 106 of the robot 104 may be controlled via positional and velocity commands issued to electromagnetic servo motors that are constructed in series with a force measurement device such as a strain gauge or a spring (such as in a series elastic actuators). The functional mapping from torque output along the axis of motion to accelerations of key points on the robot's kinematic structure is stored, and used in concert with a calibrated model of the robot for effective impedance control.

In certain embodiments, every joint 106 of the robot 104 may be controlled via flow rate or positional commands in a hydraulic or pneumatic system that affects the mechanical work and force output of a “cylinder” by a valve connected to a fluid pressure source (A “cylinder” in this embodiment is any hollow, pressure-sealed chamber that may be permitted to expand or contract in volume along certain degrees of freedom in response to changes in its internal pressure according to combined gas law and similar laws in fluid mechanics). The functional mapping from torque output along the axis of motion to accelerations of key points on the robot's kinematic structure is stored, and used in concert with a calibrated model of the robot for effective impedance control.

As noted, in certain embodiments, the chopstick degrees of freedom can be actuated via a kinematic chain connected by a set of links as in 104. Certain embodiments might instead control the motion of the chopstick via suspending the end effector link on a platform driven by one or more, multiple-degree-of-freedom actuators. A “Delta robot” provides three degrees-of-freedom of motion and, for six degree-of-freedom motion, a “Stewart Platform” or other “cable-driven robot” spatial actuators might control the spatial motion and force imparted by the chopstick end effector.

The chopstick end effector can be implemented as a cone with a wide aperture. Where a long, thin chopstick might more easy fit into gaps between individual objects 108 when singulating items from clutter, a wider aperture cone that can rotate about its longitudinal axis can roll to reposition objects in its grip, where the center of rotation would be at the point of the conical geometry. Such an embodiment would be able to simultaneously maintain a firm grasp on objects while re-orienting them in its grasp.

The chopstick end effector can be implemented with anisotropic or variable friction along its surface. An example of anisotropic friction in nature is the scales of a fish that are smooth in one direction and rough in the opposite. This embodiment would provide a customizable grip force along the length of the chopstick to permit rotation or sliding in some directions to reposition an object 108 in the robot's grasp while maintaining a firm hold on objects.

The camera(s) 105 and sensors 106 can be wired and/or wirelessly communicatively coupled to a back end server or servers, comprising one or more processors running software as described above and below, which in turn can be wired and/or wirelessly communicatively coupled with one or more processors included in robot 104. The server processors run various programs and algorithms 112 that identify objects 108 within images workspace 102 that the system has been trained to identify. For example, a camera image may contain a corrugated box, a wrench, and a sealed can of vegetables, all of which can be identified and added to a model containing the objects in the camera's sightline and the robot's vicinity. Server(s) 110 can be local or remote from workspace 102. Alternatively, the one or more programs/algorithms can be included in robot 104 and can be run by the one or more processors included in robot 104.

Some sensors 106 can include an operator safety-focused “light curtain” or similar system that activates when a person enters the operational workspace of the robot. When such an event occurs, an independent safety controller deployed on an embedded circuit intercedes in the robot's power supply limiting supplied voltages and currents to prevent the robot from exceeding safety approved operation speeds and output forces. The immediate intercision can also be interpreted by the control system to behave differently, considering that a human operator is present and close enough to physically interact with the robot while it is in motion. Similar behaviors can be designed with more advanced computer vision systems that map occupancy of the robots workspace and prevent the robot from entering areas of its workspace that are occupied by a person or item that should not be physically interacted with.

The programs/algorithms 112 can include deep neural networks that do bounding box identification from camera 105 (RGB) images to identify and demarcate, with boxes that overlay every object that the system has been trained to manipulate and observed in a particular image. This software can also encompass software that is specialized at identifying certain objects, like corrugated cardboard boxes, using algorithms like edge detectors, and using multiple camera views in order to get the 3D position of points in the 2D camera image.

When objects are unique, e.g., a 12 oz can of Maxwell House coffee, the 2D bounding box from a single camera is sufficient to estimate the object's 3D pose. When objects instead belong to a class, e.g., corrugated cardboard box, such that object sizes can vary, multiple camera views, e.g., from a calibrated stereo camera setup, are needed to establish correspondences between points in the 2D camera images and points in 3D. State-of-the-art techniques for training these neural networks use domain randomization to allow objects to be recognized under various lighting conditions, backgrounds, and even object appearances. Function approximation, e.g., deep neural networks trained on synthetic images, or a combination of function approximation and state estimation algorithms, can be used to estimate objects' 3D poses, or to estimate the value of a different representation, like keypoints, that uniquely determines the location and orientation of essentially rigid objects from RGB-D data. For example, a Bayesian filter (like a “particle filter”) can fuse the signals from force sensors with the pose estimates output from a neural network in order to track the object's state position and velocity.

Function approximation, e.g., deep neural networks trained on camera images, can be used to estimate a dynamic, e.g., inertia, friction, etc., and geometric (shape) model of all novel objects that are tracked by the system, e.g., using the bounding boxes. The coffee can example used above would not require this process, because it can be expected that every coffee can is identical to the limitations of the accuracy brought to bear on the problem, from the accuracy of the sensing and the control frequency requirements. By way of contrasting example, boxes will exhibit different surface friction depending on the material of the box, e.g., corrugated, plastic, etc., and the location on the box. For example, if there is a shipping label placed on part of the box, then this can affect surface friction. Similarly, a neural network can infer the geometry of the obscured part of a box from a single image showing part of the box.

If an object is articulated, a kinematic model of the object can be estimated as well. Examples include doors, cabinets, drawers, ratchets, steering wheels, bike pumps, etc. The ground and any other stationary parts of the environment are modeled as having infinite inertia, making them immobile. Function approximation (e.g., deep neural networks trained on pressure fields) can be used to estimate the 3D poses of objects that the robot is contacting (and thereby possibly obscuring the RGB-D sensor that we′d normally used for this purpose).

Kinematic commands (“desireds”) for the robot can be accepted for each object that the robot attempts to manipulate. The desireds can come from behaviors. A behavior can be either a fast-to-compute reactive policy, such as a look up table that maps, e.g., the joint estimated state of the robot and manipuland to a vector of motor commands, or can include deliberative components, or planners, e.g., a motion planner that determines paths for the robot that do not result in contact with the environment. In that case of a planner, the output will be a time-indexed trajectory that specifies position and derivatives for the robot and any objects that the robot wants to manipulate.

In turn, the planner can use high level specifications, e.g., put the box on the table, to compute the output trajectories. This process is where motion planning comes into play.

By inverting the dynamics model of the robot (from a=F/m to F=ma) and modeling contact interactions as mass-spring-damper systems, the forces necessary to apply to the robot's actuators can be computed in order to produce forces on contacting objects, and thereby move both them and the robot as commanded.

If the force/torque data is available, then the sensed forces on the robot can be compared against the forces predicted by the dynamics model. If, after applying some filtering as necessary, the forces are largely different, the robot can halt its current activity and act to re-sense its environment, i.e., reconcile its understanding of the state of its surroundings with the data it is perceiving. For example, a grasped 1 kg box might slip from the robot's end effectors' grasp while the robot is picking the object. At the time that the object slips from the robot's grasp, the robot's end effector would accelerate upward, since less force would be pulling the end effector downward, while the dynamics model, which assumes the object is still within the object's grasp, might predict that the end effector would remain at a constant vertical position. When the disparity between the actual end-effector acceleration and predicted end-effector acceleration becomes greater than the model's bounds of accuracy, it becomes clear that the estimated model state is direly incorrect. For a picking operation, we expect this mismatch to occur due to a small number of incidents: an object has been inadvertently dropped, multiple objects have been inadvertently grasped, e.g., the robot intended to grab one object but grabbed two, the robot struck a human in the workspace, or the robot inaccurately sensed the workspace, causing it to inadvertently collide with the environment, i.e., the robot failed to sense an object's existence or it improperly parameterized a sensed object, e.g., estimating a box was small when it was really large.

FIG. 3 is a diagram illustrating an example robot 104 in accordance with certain embodiments. Conventional robotics lacks the ability to build automated machines for manipulating objects of various shapes, sizes, inertias, and materials. Simply attaching robotic grippers to industrial robot arms barely allows “picking” objects reliably, much less the various ways a robot could interact with its environment: through pinching, stretching, tying, squeezing, brushing, scratching, etc. As a result, conventional robots are very limited in how they can physically interact with their environments, and most manual labor cannot be automated.

The hardware solution of FIG. 3 uses a simpler design than a conventional robotic arm. It has simple geometry where it touches objects and it has simple kinematics of movement. The design comprises at least two independent “chopsticks” 302, each of which has nominally five degrees of freedom of movement: linear translation along all three Cartesian dimensions as well as “pitch” and “yaw” rotations. In certain embodiments, the degrees-of-freedom can be decreased to as few as four to simplify the mechanism, which would still be suitable for certain applications, or more than five, to allow redundancy of movement which helps with, e.g., obstacle avoidance.

Rotational degree-of-freedom joints can be substituted for the linear degree-of-freedom joints for joints 106.

Chopstick 302 can be affixed to a part of the environment, e.g., bolted into the ground via base 304, or it can be attached to a wheeled or a legged base. The two or more chopsticks 302 can then work together. Two or more chopsticks can work together when attached to different bases.

The chopsticks 302 can comprise a very stiff body, of similar shape and proportions to a pool cue. Since the actual dimensions can be scaled, larger robotic systems for manipulating medium- and large-sized objects can be configured as can small robotic systems for assembling, e.g., phones.

Each chopstick 302 can be driven along five degrees of freedom of movement with an electromagnetic actuator for each degree of freedom. Other embodiments provide as few as four or as many as six independent degrees of freedom of motion to the chopstick end effector. Each actuator can use a low-gearing ratio (approximately <10:1) to minimize “passive impedance” and maximize “transmission transparency”. Given an actuator with high enough output torque, a speed reducer (transmission) can be removed entirely. A “frameless” motor embedded into the chassis of the robot, implemented as linear drives and rotary direct drives, can provide a transmission ratio of 1:1, as either the coil or the stator of the electromagnetic motor acts directly as the output of the actuator, avoiding mechanical frictions and other losses that are introduced by transmission. The functional mapping from each actuator's motor current to torque along the axis of motion is stored, and used in concert with a calibrated model of the robot for effective impedance control.

An impedance controller takes desired task-space positions, velocities, and accelerations (describing the kinematics of the robot's end-effector or another point on the robot) and current joint-space positions and velocities as input and, using a set of variable “gain” parameters and a multi-body model of the robot, produces a vector of forces to be applied at the robot's joints that realizes the task-space kinematics, to accurately commensurate with the fidelity of the robot's rigid body model. For electromagnetic actuators, these forces are controlled by varying the voltage applied to the motors. In turn, the force produced as the voltage varies is determined either through pure physical experimentation, e.g., measuring applied torque corresponding to a given voltage, and interpolating between the various measured values or through a motor model, e.g. fitting parameters to a mathematical model, through physical experimentation.

A 6-axis force/torque sensor 306 can be mounted inline between the actuators and each chopstick. This allows sensing the totality of contact-based forces acting on the chopstick 302. Further, the actuators for each chopstick 302 can be powered either through a battery or a main connection, e.g., using household voltage, 120V in the U.S.

A force torque (FT) sensor is an electronic device that is designed to monitor, detect, record and regulate linear and rotational forces exerted upon it. In other words, the FT sensor in a robotic or mechanical system can be compared to the micro-receptors in skin that equip animals with the sense of “touch.”

As a contact sensor, it is specifically designed to interact with physical objects in its environment. In order to mitigate interference from sound waves and debris, such a sensor is designed to operate in a variety of climates and external conditions. Depending on the model and intended function, a force torque sensor is able to send digital or analog signals, and measure static or dynamic forces.

The most popular type of force torque sensor is the six-axis sensor, such as can be used for sensor 306. Such a sensor is capable of measuring forces in every direction. A six-axis FT sensor generally utilizes strain gauge technology; when pressure is applied, the resistance within the gauge increases or decreases proportionally to the force it receives. This is how the sensor measures the movement of its external frames in relation to one another. Six-axis sensors can be found in robotic arms at the “wrist joint” preceding a gripper or tool end effector.

Force torque sensors are implemented in a myriad of applications and mechanisms. Most notably, FT sensors have advanced “cobot” or “collaborative robot” technology. Cobots are designed to work in tandem with human beings, taking on both the time-consuming work such as driving screws in an efficient manner, and physically demanding tasks such as lifting automobile parts in assembly lines.

Sensor 306 is mounted inline between the “butt” of the chopstick 302 and the remainder. If someone touches any part of the chopstick 302 that is mounted to the tool-side of the sensor a force and torque will be detected on the sensor. Centrifugal and coriolis forces will also be detected, i.e., if the robot moves quickly, a force will be detected at the sensor; however, these forces can be subtracted from those sensed using the multi-body dynamics model of the robot and the estimates of the robot's positions and velocities at the joints.

Each chopstick 302 can also be constructed from carbon fiber to minimize weight and maximize stiffness. The minimized weight allows maximizing movement speed, payload, or some function of the two. A soft skin with the compliance of rubber (approximately 10 MPa) can cover the chopstick 302. The skin can be instrumented with an array of tactile sensors, so that pressure location and magnitude can be detected at any point on the skin.

Robots that must reposition their grasp on a held object typically place them on a nearby surface and pick them up from a new angle such as a person might do when trying to turn an item over when using only one hand. Where a person might use two hands for more dexterous manipulation tasks (e.g., solving a puzzle box), multiple instances of the chopstick manipulation platform might work together to perform more complex tasks. For a robot whose gantry occupies a cubic volume in space such as the embodiment in FIGS. 4A and B, potentially 27 chopstick platforms (54 chopstick end effector “fingers”, in an arrangement that looks like a 3×3 Rubik's cube) are able to reach a volume of space at the center point of the robot cluster where the complex task might take place. Permutations of the aforementioned modular, cooperative configuration can permit multiple chopstick platforms to achieve highly complex tasks where one robot might not have sufficient dexterity.

In one embodiment, a hard plastic “fingernail” extends, e.g., up to 10 mm beyond the chopstick and serves as a rigid tool for manipulation (e.g., cutting and scraping actions).

Logistics companies spend billions yearly on manual labor to “pick” packages of various sizes and shapes (e.g., tires, boxes, mail totes) from constrained spaces, e.g., semi-trailers as opposed to open spaces, like warehouse floors. Once picked, the object can be placed onto a new surface or into a new (possibly constrained) space, including onto a conveyor belt, into a box, etc. In certain embodiments, the work space 102 can be a “pick and place” workspace in which a combined hardware/software solution for the pick and place problem.

In such an embodiment, the manipulators 302 can be configured to reach into a confined space. A kinematic and geometric model of each manipulator is available (often provided for free or can be constructed easily). Each manipulator 302 possesses joint position sensors, like optical encoders, that allow its geometric configuration to be computed at any time.

As noted above, each manipulator is either affixed to the environment, e.g., via base 304, or attached to a mobile (e.g., wheeled or legged) base that is capable of reaching to every point in the constrained environment. One such mounting option: the manipulator runs on rails affixed to the environment (like a gantry that runs down the length of a semi-trailer).

Each object can then be picked and placed automatically, i.e., the robot 104 continues picking until all objects have been picked from the space, or manually, i.e., one at a time, selected using a user interface.

Robot 104 can be configured to use grippers, suction, hands or the surface of the manipulators 302 to secure (“pick”) the object. Further, sensory feedback, including but not limited to tactile sensors, six-axis force/torque sensors, and position encoders can be used to determine when the manipulator contacts someone or something in its environment unexpectedly or when the manipulator does not make contact with something that it planned (and expected) to contact.

Inverse kinematics algorithms, via analytical (closed form) solutions or numerical approaches can be used to compute the positions of each robot's joints such that the robot will be able to pick or place a given object. This process makes use of the estimated state, i.e., position, orientation, and velocity, of the manipuland, which is obtained by a combination of our visual pose estimation system and state estimation (filtering), and the grasp database (described in Patent 001).

When an object is selected for picking (by the human, through a user interface, or automatically, by the robot), the combination of the “relative” pose (returned by the grasp database) and the manipuland's estimated pose gives target pose(s) for the robot's end-effector(s). The inverse kinematics procedure turns these target poses into joint-space positions necessary to realize the target poses.

Cameras and/or other electromagnetic-radiation-based sensors, like LIDAR and infrared projectors that can both “see” (observe) into every constrained space can also be used. These sensors can be mounted inside the constrained space, mounted on the manipulator(s), or both. The electromagnetic-radiation-based sensors allow perceiving the objects in the environment.

These perceptions are input to a motion planner (described below), which ultimately needs to move the manipulator(s) to the correct configurations such that the gripper, etc. is able to pick or place an object. These perceptual inputs to the planner can be computed several ways, like: 1. The sensors can be used to identify objects to be picked, from which parameters can be fit (“learned”) to a catalog of known models. Given the model, the sensory data, and a mathematical function for predicting the observations from the model given a pose, an estimate of the configuration of the object, e.g., a 3D pose if the object is nearly rigid, is computed for this object. The system stores configurations that allow the robot to perform the final approach for a pick, and one of these configurations, e.g., arbitrarily, the best that optimizes some criterion, serves as the input to the planner. Or, 2. A functional mapping from sensory data to the configuration that the robot needs to be in to pick or place the object is stored: mapping data can be generated automatically using simulations or can be learned from user demonstrations. That robot configuration serves as the input to the planner.

A high-fidelity model of every constrained space can then be generated. That model can be parametric, implying, e.g., the location of a standard semi-trailer need not be stationary. System identification via, e.g., least squares (any other type of nonlinear regression) can be used to map the sensory data to the best parameterization of the model that “explains” the sensory data.

A planner can be implemented that: 1. takes as input current (for picking) and desired (for placing) configurations for objects to be manipulated; 2. applies sample-based motion-planning algorithms to compute a series of contact-free configurations for the robot through a constrained space; 3. uses trajectory optimization, or just polynomial or other splines, to fit trajectories between the initial and final robot configurations while containing the contact-free path points; 4. and can keep objects at particular orientations during the picking/placing process by introducing kinematic constraints into the sample-based planning process.

A bin packing algorithm(s) can also be implemented to determine target placement configurations for objects so that the constrained space can be packed tightly (e.g., air cargo containers). It is assumed that the system has a geometric description of all of the objects to be packed into the container. This geometric description might come manually, from a 3D scanner (e.g., an employee at FedEx office scans the object), or it might come from automatically, from the robot scanning the object.

Given these object models, it is then necessary to solve the bin packing problem (https://en.wikipedia.org/wiki/Bin_packing_problem). Since the problem is generally intractable to solve optimally, one of the approximation algorithms, e.g., using a heuristic like “best-fit”, must be applied instead. A solution to the bin packing problem gives a configuration of packed objects in the container. It is then up to the robotic planning and control system to provide a plan that gets the robot to place the objects in that configuration in the container.

FIG. 2 is a block diagram illustrating an example wired or wireless system 550 that can be used in connection with various embodiments described herein. For example the system 550 can be used to implement the robot control system described above and can comprise part of the robot 104 or backend servers 110. The system 550 can be a server or any conventional personal computer, or any other processor-enabled device that is capable of wired or wireless data communication. Other computer systems and/or architectures may be also used, as will be clear to those skilled in the art.

The system 550 preferably includes one or more processors, such as processor 560. Additional processors may be provided, such as an auxiliary processor to manage input/output, an auxiliary processor to perform floating point mathematical operations, a special-purpose microprocessor having an architecture suitable for fast execution of signal processing algorithms (e.g., digital signal processor), a slave processor subordinate to the main processing system (e.g., back-end processor), an additional microprocessor or controller for dual or multiple processor systems, or a coprocessor. System 550 can also include a tensor processing unit as well as motion planning processors or systems.

Such auxiliary processors may be discrete processors or may be integrated with the processor 560. Examples of processors which may be used with system 550 include, without limitation, the Pentium® processor, Core i7® processor, and Xeon® processor, all of which are available from Intel Corporation of Santa Clara, Calif.

The processor 560 is preferably connected to a communication bus 555. The communication bus 555 may include a data channel for facilitating information transfer between storage and other peripheral components of the system 550. The communication bus 555 further may provide a set of signals used for communication with the processor 560, including a data bus, address bus, and control bus (not shown). The communication bus 555 may comprise any standard or non-standard bus architecture such as, for example, bus architectures compliant with industry standard architecture (ISA), extended industry standard architecture (EISA), Micro Channel Architecture (MCA), peripheral component interconnect (PCI) local bus, or standards promulgated by the Institute of Electrical and Electronics Engineers (IEEE) including IEEE 488 general-purpose interface bus (GPIB), IEEE 696/S-100, and the like.

System 550 preferably includes a main memory 565 and may also include a secondary memory 570. The main memory 565 provides storage of instructions and data for programs executing on the processor 560, such as one or more of the functions and/or modules discussed above. It should be understood that programs stored in the memory and executed by processor 560 may be written and/or compiled according to any suitable language, including without limitation C/C++, Java, JavaScript, Pearl, Visual Basic, .NET, and the like. The main memory 565 is typically semiconductor-based memory such as dynamic random access memory (DRAM) and/or static random access memory (SRAM). Other semiconductor-based memory types include, for example, synchronous dynamic random access memory (SDRAM), Rambus dynamic random access memory (RDRAM), ferroelectric random access memory (FRAM), and the like, including read only memory (ROM).

The secondary memory 570 may optionally include an internal memory 575 and/or a removable medium 580, for example a floppy disk drive, a magnetic tape drive, a compact disc (CD) drive, a digital versatile disc (DVD) drive, other optical drive, a flash memory drive, etc. The removable medium 580 is read from and/or written to in a well-known manner. Removable storage medium 580 may be, for example, a floppy disk, magnetic tape, CD, DVD, SD card, etc.

The removable storage medium 580 is a non-transitory computer-readable medium having stored thereon computer executable code (i.e., software) and/or data. The computer software or data stored on the removable storage medium 580 is read into the system 550 for execution by the processor 560.

In alternative embodiments, secondary memory 570 may include other similar means for allowing computer programs or other data or instructions to be loaded into the system 550. Such means may include, for example, an external storage medium 595 and an interface 590. Examples of external storage medium 595 may include an external hard disk drive or an external optical drive, or and external magneto-optical drive.

Other examples of secondary memory 570 may include semiconductor-based memory such as programmable read-only memory (PROM), erasable programmable read-only memory (EPROM), electrically erasable read-only memory (EEPROM), or flash memory (block oriented memory similar to EEPROM). Also included are any other removable storage media 580 and communication interface 590, which allow software and data to be transferred from an external medium 595 to the system 550.

System 550 may include a communication interface 590. The communication interface 590 allows software and data to be transferred between system 550 such as possibly robot 104, camera 105 or other sensors, as well as external devices (e.g. printers), networks, or information sources. For example, computer software or executable code may be transferred to system 550 from a network server via communication interface 590. Examples of communication interface 590 include a built-in network adapter, network interface card (NIC), Personal Computer Memory Card International Association (PCMCIA) network card, card bus network adapter, wireless network adapter, Universal Serial Bus (USB) network adapter, modem, a network interface card (NIC), a wireless data card, a communications port, an infrared interface, an IEEE 1394 fire-wire, or any other device capable of interfacing system 550 with a network or another computing device.

Communication interface 590 preferably implements industry promulgated protocol standards, such as Ethernet IEEE 802 standards, Fiber Channel, digital subscriber line (DSL), asynchronous digital subscriber line (ADSL), frame relay, asynchronous transfer mode (ATM), integrated digital services network (ISDN), personal communications services (PCS), transmission control protocol/Internet protocol (TCP/IP), serial line Internet protocol/point to point protocol (SLIP/PPP), and so on, but may also implement customized or non-standard interface protocols as well.

Software and data transferred via communication interface 590 are generally in the form of electrical communication signals 605. These signals 605 are preferably provided to communication interface 590 via a communication channel 600. In one embodiment, the communication channel 600 may be a wired or wireless network, or any variety of other communication links. Communication channel 600 carries signals 605 and can be implemented using a variety of wired or wireless communication means including wire or cable, fiber optics, conventional phone line, cellular phone link, wireless data communication link, radio frequency (“RF”) link, or infrared link, just to name a few.

Computer executable code (i.e., computer programs or software) is stored in the main memory 565 and/or the secondary memory 570. Computer programs can also be received via communication interface 590 and stored in the main memory 565 and/or the secondary memory 570. Such computer programs, when executed, enable the system 550 to perform the various functions of the present invention as previously described.

In this description, the term “computer readable medium” is used to refer to any non-transitory computer readable storage media used to provide computer executable code (e.g., software and computer programs) to the system 550. Examples of these media include main memory 565, secondary memory 570 (including internal memory 575, removable medium 580, and external storage medium 595), and any peripheral device communicatively coupled with communication interface 590 (including a network information server or other network device). These non-transitory computer readable mediums are means for providing executable code, programming instructions, and software to the system 550.

In an embodiment that is implemented using software, the software may be stored on a computer readable medium and loaded into the system 550 by way of removable medium 580, I/O interface 585, or communication interface 590. In such an embodiment, the software is loaded into the system 550 in the form of electrical communication signals 605. The software, when executed by the processor 560, preferably causes the processor 560 to perform the inventive features and functions previously described herein.

In an embodiment, I/O interface 585 provides an interface between one or more components of system 550 and one or more input and/or output devices. Example input devices include, without limitation, keyboards, touch screens or other touch-sensitive devices, biometric sensing devices, computer mice, trackballs, pen-based pointing devices, and the like. Examples of output devices include, without limitation, cathode ray tubes (CRTs), plasma displays, light-emitting diode (LED) displays, liquid crystal displays (LCDs), printers, vacuum florescent displays (VFDs), surface-conduction electron-emitter displays (SEDs), field emission displays (FEDs), and the like.

The system 550 also includes optional wireless communication components that facilitate wireless communication over a voice and over a data network. The wireless communication components comprise an antenna system 610, a radio system 615 and a baseband system 620. In the system 550, radio frequency (RF) signals are transmitted and received over the air by the antenna system 610 under the management of the radio system 615.

In one embodiment, the antenna system 610 may comprise one or more antennae and one or more multiplexeers (not shown) that perform a switching function to provide the antenna system 610 with transmit and receive signal paths. In the receive path, received RF signals can be coupled from a multiplexor to a low noise amplifier (not shown) that amplifies the received RF signal and sends the amplified signal to the radio system 615.

In alternative embodiments, the radio system 615 may comprise one or more radios that are configured to communicate over various frequencies. In one embodiment, the radio system 615 may combine a demodulator (not shown) and modulator (not shown) in one integrated circuit (IC). The demodulator and modulator can also be separate components. In the incoming path, the demodulator strips away the RF carrier signal leaving a baseband receive audio signal, which is sent from the radio system 615 to the baseband system 620.

If the received signal contains audio information, then baseband system 620 decodes the signal and converts it to an analog signal. Then the signal is amplified and sent to a speaker. The baseband system 620 also receives analog audio signals from a microphone. These analog audio signals are converted to digital signals and encoded by the baseband system 620. The baseband system 620 also codes the digital signals for transmission and generates a baseband transmit audio signal that is routed to the modulator portion of the radio system 615. The modulator mixes the baseband transmit audio signal with an RF carrier signal generating an RF transmit signal that is routed to the antenna system and may pass through a power amplifier (not shown). The power amplifier amplifies the RF transmit signal and routes it to the antenna system 610 where the signal is switched to the antenna port for transmission.

The baseband system 620 is also communicatively coupled with the processor 560. The central processing unit 560 has access to data storage areas 565 and 570. The central processing unit 560 is preferably configured to execute instructions (i.e., computer programs or software) that can be stored in the memory 565 or the secondary memory 570. Computer programs can also be received from the baseband processor 610 and stored in the data storage area 565 or in secondary memory 570, or executed upon receipt. Such computer programs, when executed, enable the system 550 to perform the various functions of the present invention as previously described. For example, data storage areas 565 may include various software modules (not shown).

Various embodiments may also be implemented primarily in hardware using, for example, components such as application specific integrated circuits (ASICs), or field programmable gate arrays (FPGAs). Implementation of a hardware state machine capable of performing the functions described herein will also be apparent to those skilled in the relevant art. Various embodiments may also be implemented using a combination of both hardware and software.

Furthermore, those of skill in the art will appreciate that the various illustrative logical blocks, modules, circuits, and method steps described in connection with the above described figures and the embodiments disclosed herein can often be implemented as electronic hardware, computer software, or combinations of both. To clearly illustrate this interchangeability of hardware and software, various illustrative components, blocks, modules, circuits, and steps have been described above generally in terms of their functionality. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the overall system. Skilled persons can implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the invention. In addition, the grouping of functions within a module, block, circuit or step is for ease of description. Specific functions or steps can be moved from one module, block or circuit to another without departing from the invention.

Moreover, the various illustrative logical blocks, modules, functions, and methods described in connection with the embodiments disclosed herein can be implemented or performed with a general purpose processor, a digital signal processor (DSP), an ASIC, FPGA or other programmable logic device, discrete gate or transistor logic, discrete hardware components, or any combination thereof designed to perform the functions described herein. A general-purpose processor can be a microprocessor, but in the alternative, the processor can be any processor, controller, microcontroller, or state machine. A processor can also be implemented as a combination of computing devices, for example, a combination of a DSP and a microprocessor, a plurality of microprocessors, one or more microprocessors in conjunction with a DSP core, or any other such configuration.

Additionally, the steps of a method or algorithm described in connection with the embodiments disclosed herein can be embodied directly in hardware, in a software module executed by a processor, or in a combination of the two. A software module can reside in RAM memory, flash memory, ROM memory, EPROM memory, EEPROM memory, registers, hard disk, a removable disk, a CD-ROM, or any other form of storage medium including a network storage medium. An exemplary storage medium can be coupled to the processor such that the processor can read information from, and write information to, the storage medium. In the alternative, the storage medium can be integral to the processor. The processor and the storage medium can also reside in an ASIC.

Any of the software components described herein may take a variety of forms. For example, a component may be a stand-alone software package, or it may be a software package incorporated as a “tool” in a larger software product. It may be downloadable from a network, for example, a website, as a stand-alone product or as an add-in package for installation in an existing software application. It may also be available as a client-server software application, as a web-enabled software application, and/or as a mobile application.

While certain embodiments have been described above, it will be understood that the embodiments described are by way of example only. Accordingly, the systems and methods described herein should not be limited based on the described embodiments. Rather, the systems and methods described herein should only be limited in light of the claims that follow when taken in conjunction with the above description and accompanying drawings. 

1-8. (canceled)
 9. A method for robotic object manipulation within a constrained object environment, comprising: receiving imaging data for a constrained object scene; generating an environmental representation of the constrained object scene based on the imaging data, comprising: identifying a plurality of objects within the constrained object scene; for each object of the plurality, determining a respective object state estimate and a respective object model; and determining an environmental constraint model; based on the environmental representation, determining a set of trajectories for a robot to grasp an object of the plurality based on an environmental collision evaluation and a set of heuristics, wherein the robot comprises a first five degree-of-freedom (DOF) kinematic chain mounting a first right-conical-frustum-shaped tool having a first frictional grasp surface; and moving the object with the first right-conical-frustum-shaped tool while using an impedance control scheme to maintain a force relationship between the object and the first frictional grasp surface.
 10. The method of claim 9, wherein determining the set of trajectories comprises: determining a series of contact-free configurations of the robot through the constrained object environment; and fitting a trajectory to the series of contact-free configurations.
 11. The method of claim 9, wherein moving the object with the first right-conical-frustum-shaped tool comprises translating the object while maintaining an object orientation.
 12. The method of claim 9, wherein the robot further comprises a second five DOF kinematic chain, independent of the first five DOF kinematic chain, which mounts a second right-conical-frustum-shaped tool having a second frictional grasp surface
 13. The method of claim 12, wherein the set of trajectories comprises a first trajectory associated with the first five DOF kinematic chain and a second trajectory associated with the second five DOF kinematic chain.
 14. The method of claim 12, wherein moving the object comprises grasping the object between the first and second frictional grasp surfaces.
 15. The method of claim 9, wherein the constrained object scene comprises a loaded semi-trailer, wherein the environmental constraint model comprises a model of semi-trailer structures.
 16. The method of claim 9, wherein objects are identified within the constrained object scene using a neural network which is pre-trained to identify object of an object class associated with the constrained object environment.
 17. The method of claim 16, wherein a stiffness of the first right-conical-frustum-shaped tool is greater than an effective stiffness of the impedance control scheme.
 18. The method of claim 9, wherein the environmental collision evaluation is determined based on pre-computed parameter values for simulated robot behaviors.
 19. The method of claim 18, wherein the pre-computed parameter values are determined using reinforcement learning for discretized state and action spaces.
 20. A robot for object manipulation, comprising: a first robotic assembly, comprising: a first tool comprising a grip surface which defines a right conical frustum about a central axis of the first tool; a force-torque sensor coupled to the first tool; and at least four actuation stages arranged in series and cooperatively connecting the first tool to a base; and a set of imaging sensors configured to generate imaging data for an object scene; a controller communicatively coupled to first robotic assembly and the set of imaging sensors; wherein the controller is configured to: generate an environmental representation of a constrained object scene, the environmental representation comprising a model of environmental constraints which is determined based on the imaging data; and generate instructions for the first robotic assembly to grip objects between the grip surface and an opposing surface based on measurements from the force-torque sensor, wherein the instructions for the first robotic assembly are generated based on an environmental collision evaluation associated with the environmental representation.
 21. The robot of claim 20, wherein the environmental representation of the constrained object scene is further generated by: based on the imaging data, identifying a plurality of objects within the constrained object scene using a trained object detector; and for each object of the plurality, determining a respective model and a state estimate relative to a workspace of the robotic assembly.
 22. The robot of claim 20, wherein the controller is further configured to: determine a series of contact-free configurations of the robot within the constrained object environment; and fit a trajectory to the series of contact-free configurations, wherein the instructions are generated based on the trajectory.
 23. The robot of claim 22, wherein the trajectory is at least partially precomputed.
 24. The robot of claim 20, wherein the constrained object scene comprises a loaded semi-trailer, wherein the model of the environmental constraints comprises a model of semi-trailer structures.
 25. The robot of claim 20, wherein the instructions facilitate moving objects with the tool while simultaneously maintaining a grip force relationship at grip surface using impedance control.
 26. The robot of claim 20, further comprising: a second robotic assembly, independently actuatable relative to the first robotic assembly, comprising: a second tool comprising a second grip surface which defines a second right conical frustum about a second central axis of the second tool; and a second set of at least four actuation stages arranged in series and cooperatively connecting the second tool to the base, wherein the second grip surface comprises the opposing surface.
 27. The robot of claim 26, wherein the controller is configured to control a distance between first and second robotic assembly based on an object size.
 28. The robot of claim 20, wherein a length of the frictional grip surface is greater than 74 cm.
 29. The robot of claim 20, wherein a length of the frictional grip surface along the central axis is at least 10 times a thickness of the first tool at a proximal end. 